Data processing circuits and interfaces

ABSTRACT

An integrated circuit contains a microprocessor core, program memory and separate data storage, together with analogue and digital signal processing circuitry. The ALU is 16 bits wide, but a 32-bit shift unit is provided, using a pair of 16-bit registers. The processor has a fixed length instruction format, with an instruction set including multiply and divide operations which use the shift unit over several cycles. No interrupts are provided. External pins of the integrated circuit allow for single stepping and other debug operations, and a serial interface (SIF) which allows external communication of test data or working data as necessary. The serial interface has four wires (SERIN, SEROUT, SER-CLK, SERLOADB), allowing handshaking with a master apparatus, and allowing direct access to the memory space of the processor core, without specific program control. Within each processor cycle, the processor circuitry is divided into plural stages, and latches are interposed between the stages to minimize power consumption.

BACKGROUND OF THE INVENTION

The present invention relates primarily to single-chip data processingdevices, but also to microprocessors and to digital circuits generally,and to interface circuits.

In the present day, many products incorporate microprocessor based dataprocessing circuits, for example to process signals, to control internaloperation and/or to provide communications with users and externaldevices. To provide compact and economical solutions, particularly inmass-market portable products, it is known to include microprocessorfunctionality together with program and data storage and otherspecialised circuitry, in a custom “chip” also known as anapplication-specific integrated circuit (ASIC). Field Programmable GateArrays (FPGA) such as those made by Xilinx™, Actel™ and Altera™ may alsobe used to implement such solutions.

However, for various reasons, the integrated microprocessorfunctionality conventionally available to an ASIC designer tends to bethe same as that which would be provided by a microprocessor designedfor use as a separate chip. The present inventors have recognised thatthis results in inefficient use of space and power in the ASIC solution,and in fact renders many potential applications of ASIC technologyimpractical and/or uneconomic.

SUMMARY OF THE INVENTION

Various aspects of the invention are defined in the appended claims,while the applicant reserves the right to claim any further aspects ofthe invention that may be disclosed herein.

In accordance with certain aspects of the present invention,microprocessor architectures are proposed which overcome the abovedrawbacks, being optimised for integration within an ASIC by providing acombination of functional features and architectural features unlike anyconventional microprocessor.

For example, in a conventional general purpose microprocessor, thearithmetic and logic unit (ALU) has a certain data width (eight bits,sixteen bits etc.), and provides operations of arithmetic addition orsubtraction, logical AND, OR combinations and left and right bit shifts,all on data of this basic width. One aspect of the invention disclosedherein is to provide a separate shifting unit, wider than the ALU width,which allows multiplication and division of two numbers, each as wide asthe ALU itself, in a circuit of relatively small size. The shifter willtypically be associated with one double-width register of the processor.

Other aspects of the invention relate to the provision of specialfunctional features and interfaces within the chip and/or between thechip and the external environment. While these other aspects can beemployed advantageously in the novel processor architecture proposedherein, it will be apparent that these specific techniques are in factapplicable and advantageous in a wide range of different microprocessorarchitectures, or even in the field of sequential digital circuitrygenerally irrespective of whether it is program-controlled or not.

As one particular such feature, the invention in another aspectprovides, in a program controlled processor, a mechanism wherebyresponse to external stimuli is provided automatically, but only attimes known in advance to the programmer. Examples of such stimuliinclude requests for communication from external devices, and entry of“sleeping” state for power conservation. In the present embodiments,special instructions are defined whereby the programmer can define fixedperiods in which external communication may take place, and fixed pointsfor entry into the sleeping state.

The various aspects of the invention will become apparent from thefollowing description of specific embodiments. These are presented byway of example only, with reference to the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows the basic arrangement of an integrated circuit including aprocessor embodying the invention;

FIG. 2 shows the programmer's model and instruction format of theprocessor of the FIG. 1 circuit;

FIG. 3 shows the data architecture of the processor;

FIGS. 4A and 4B illustrate the execution of multiply and divideoperations in the processor of FIG. 3;

FIGS. 5A to 5G show waveforms illustrating various functional featuresof the circuit of FIG. 1;

FIG. 6 is a schematic diagram of a serial interface (SIF) similar tothat of the circuit of FIG. 1;

FIG. 7 shows in more detail a shift register of the serial interface ofFIG. 6;

FIG. 8 is a flowchart showing operation of a master apparatus (EXT) inrelation to the serial interface of FIGS. 6 and 7;

FIG. 9 is a flow-chart showing operation of a slave apparatus (ASIC) inrelation to the serial interface;

FIG. 10 shows circuitry for monitoring external signals during asleeping state of the processor of FIGS. 1 to 6;

FIG. 11 is a schematic diagram of the electronics for a domestic gasmeter, including the novel processor; and

FIG. 12 shows a modification of the serial interface of FIGS. 6 and 7,allowing multiple slave devices.

The description of the embodiments which follows includes the followingsections:

Overview of the Specific Embodiment including a novel processor core

Basic Arrangement of the processor and associated hardware integratedwithin an ASIC

Programmer's Model for the processor

Instruction Set for the processor

Data Architecture of the processor

Pin Description together with detailed operation

Construction and Operation of the Serial Interface (SIF)

Power Saving Features

Application Example—Domestic Gas Meter

Further Notes and alternative embodiments.

OVERVIEW OF THE SPECIFIC EMBODIMENT

Single chip solutions are ideal for many products, and can comprisemixed mode CMOS circuitry such that analogue and digital signalprocessing can be performed on-chip. For some applications, however,there is a further need for more logic processing, for functions such asuser interfaces, controlling E²PROMs, networks and protocol conversions.Conventionally this requires a traditional microprocessor ormicrocontroller, which increases product size for two reasons. Firstly,the circuit board of a product has two chips instead of one, andsecondly, the stand-by modes of conventional low-power processors stillconsume substantial power, so that the product requires a largerbattery. The larger battery and bigger circuit board make the productquite expensive, while the need for communication between chips makesthe system more sensitive to electromagnetic interference. There aremany other disadvantages of a multiple chip solution.

There is clearly a desire for single chip solutions in which theprocessor, program storage and RAM are embedded in a single ASIC, tofacilitate products such as: portable wireless products,instrumentation, utilities metering systems, low data rate radiosystems, medical diagnostics, safety critical and verifiable systems,pagers, and certain sections of mobile telephones. Certain conventionaldesigns can be reduced to a single ASIC by embedding a processor or amemory corresponding to the conventional external processor, if the ASICcan be made large enough. However, this goes only a small way toaddressing the problems identified above.

The processor which is the subject of the present disclosure is a custommicroprocessor which has been developed for use in ASIC designs. Therequirements and trade-offs when designing for an ASIC are quitedifferent from discrete designs, so the novel processor differs in anumber of important aspects from a conventional microprocessor. As aresult, ASIC designs that use the novel processor are lower cost, lowerpower and lower risk than those using existing processor designs. Somekey features of the processor are as follows.

A very low gate count of, for example 3000 gates can be achieved, whichcompares with 8000 gates for an Intel 8051 microprocessor core. Toefficiently support signal processing applications, the novel processorhas powerful signed arithmetic functions including hardware multiply (16bits by 16 bits), divide (32 bits by 16 bits) and normalisation, whichare conventionally only found in much larger processor designs. Inparticular, a shift unit separate from the conventional ALU providesdouble-width shift operations, in cooperation with a pair of registers.

Power consumption is minimised in three ways. Firstly the architectureand instruction set have been chosen for power efficiency—especially thelow power implementation of signal processing functions. Secondly, thedetailed design minimises clatter and unnecessary transitions. Thirdly,when idle, the processor can enter a sleep mode, where it uses verylittle power. Furthermore, by use of static circuits, the circuit canstop the processor clock completely without loss of data, givingvirtually zero power consumption. Special asynchronous circuitrymonitors external wake-up signals with minimum power dissipation. Thesefeatures complement the inherently lower power achievable with singlechip designs to make possible new applications.

A built-in serial interface (the SIF) allows external access to theaddress space and processor registers. It is used to provide efficientIC manufacturing test (allowing testing of the processor, ROM, RAM andmemory mapped peripherals), prototype device proving, and productionboard level test. It can also be used for ASIC to microprocessorcommunication in systems that use a tied external microprocessor,eliminating the need for separate interfaces and on-board circuitry forcommunications and testability functions.

The processor uses separate program and data spaces (a Harvardarchitecture), and the program instruction word can thus be wider ornarrower than the data word. This is an example of where the on-chipnature of the design favours a different solution to the traditionalmicroprocessor (von Neumann architecture) because the need for separateaddress busses on-chip does not cause any increase in the number ofconnections (pins) off-chip. On both address busses, the timing allowseither synchronous or asynchronous devices. This is important assynchronous devices are often smaller and lower power than theirasynchronous counterparts.

For verification and also to reduce gate count, the processor does notsupport interrupts. Instead, efficient polling instructions and the useof sleep/wake-up allow efficient multi-event responses. The processorhas a RISC instruction set. Instructions are single word and mostlyexecute in a single cycle. Particular operations taking several cyclesto execute are the Multiply (16 cycles), Divide (16 cycles) and Shift(variable) operations, which use the special shift unit mentioned above.There are four addressing modes: Immediate, Direct and Indexed byregisters X and Y.

The processor is suitable for programming in a high level language suchas ‘C’, or in assembler. The instruction set and addressing modes allowefficient, readable assembler programs to be easily written. Theregisters and addressing modes also allow effective compilation of highlevel languages such as C. However, it may be that, for efficiency andverification reasons, programs are best written in assembler, althoughdevelopment effort may be higher.

Basic Arrangement

FIG. 1 shows the basic arrangement of the novel processor core andassociated hardware included in a single integrated circuit. The “core”of the processor is shown at 100, while the boundary of the integratedcircuit is shown at 102. Various “pins” of the processor are defined,for communication with other functional units on the chip, and with theexternal world.

Also included on chip is program memory 104, fixed data memory (ROM)106, data RAM 108 and memory-mapped input/output circuitry 110. Theprogram space address bus is 16 bits wide and labelled PROG_ADDR, whilethe data space address bus (also 16 bits) is labelled ADDR.

“Pins” of the processor core which are connected to physical pins at thechip boundary 102 are related to the debug functions (pins STOPB,RUN_STEP) and to the serial interface (SER_IN, SER_OUT, SER_CLK, andSER_LOADB).

The person skilled in the art will appreciate that, in addition to theelements explicitly shown in the Basic Arrangement of FIG. 1, otheranalogue and/or digital signal processing functions will typically beprovided within the ASIC itself, according to the particularapplication. Where high speed signal processing functions are to beperformed by the ASIC, it will typically be desirable to have theseperformed by dedicated circuitry rather than by the programmed processorcore described above, and to have less computationally intensive, butmore logically complex parts of the required functionality implementedby the processor under program control. Therefore, for example, if anumber of external analogue signals are to be sensed with a highbandwidth, and combined in accordance with predetermined repetitivealgorithms to obtain a meaningful measurement, the A-D conversion andthese high-speed processing functions can be performed by dedicatedcircuitry, and the measured value supplied periodically to the processorcore via the memory mapped input/output circuitry. This principle isapplied in the gas flow meter disclosed below as an example application,with reference to FIG. 11.

Programmer's Model

FIG. 2 shows the logical arrangement of internal registers and memoryspace of the processor of FIG. 1, commonly known as the programmer'smodel. Registers AH, AL, X and Y are provided for transient storage ofdata and address values within the processor core, together with aregister for the program counter PC, and for various flags (C, S, N, Z)generated by the arithmetic and logic circuits of the processor.

The program storage space (memory 104) comprises 64K (65536 decimal)locations, each storing an 18 bit instruction word. The instructionwords are of fixed format, and the format is illustrated at the foot ofFIG. 2. Bits numbered 17 (MSB) down to 8 of the instruction word(hereinafter INSTRN[17:8]) contain an address value or data for theinstruction. Bits INSTRN[7:4] contain an operation code (opcode) for theoperation to be performed. Bits INSTRN[3:2] can be used to extend theoperation opcode, and, in particular, are used by certain instructionsto identify a particular storage register from among those shown in theprogrammer's model. Finally, bits INSTRN[1:0] identify an addressingmode for the current instruction.

In the data space (formed by memories 106, 108, 110) there are 64K(65536 decimal) locations, each storing a 16 bit data word. 512locations at the beginning of the data space (0000 to 01FF), and 512locations at the end of the data space (FE00 to FFFF) can be addresseddirectly with the 10 address bits of each instruction word. Theremaining 63K locations can be addressed with indirect (for exampleindexed) addressing modes.

The instruction set implemented in this example processor will now beillustrated in the format of assembler language mnemonics, which will bereadily understood by those familiar with the design or use ofmicroprocessors.

INSTRUCTION SET FLAGS ASSEMBLER OPERATION SET NOP None (NB instructionall — zeroes) SLEEP Enter sleep mode PRINT None (debug request for —simulator) SIF Perform SIF access during — instrn LD reg,data reg ← dataN Z ST reg,data data ← reg N Z ADD reg,data reg ← reg + data C S N ZADDC reg,data reg ← reg + data + C C S N Z SUB reg,data reg ← reg − dataC S N Z SUBC reg,data reg ← reg − data − C C S N Z NADD reg,data reg ← −reg + data C S N Z CMP reg,data flags ← reg − data C S N Z MULT data A ←AH + AL × data — DIV data AL ← A>>1 ÷ data, — AH[15:1] ←rem TST dataflags ← data N Z BSR branch_addr X ← PC + 1, PC ← — branch_addr SAL dataC ← [AH,AL] ← 0 C SAR data AH[15] → [AH,AL] → C C SCL data C ← [AH,AL] ←C C SCR data C → [AH,AL] → C C ct = data [3:0] 0..15 OR reg,data reg ←reg | data N Z AND reg,data reg ← reg & data N Z XOR reg,data reg ← regdata N Z BRA branch_addr PC ← branch_addr — BLT branch_addr If S = 1 PC← branch_addr — BPL branch_addr If N = 0 PC ← branch_addr — BMIbranch_addr If N = 1 PC ← branch_addr — BNE branch_addr If Z = 0 PC ←branch_addr — BEQ branch_addr If Z = 1 PC ← branch_addr — BCCbranch_addr If C = 0 PC ← branch_addr — BCS branch_addr If C = 1 PC ←branch_addr —

ADDRESSING MODES Assembler Addr mode format “data” source/destinationAddress mode for instruction with “data” Immediate #12 data = addr16Direct @12 data = @addr16 Indexed X @(12,X) data = @(X+addr16) Indexed Y@(12,Y) data = @(Y+addr16) “branch addr” new PC value Address mode forinstructions with “branch_addr” PC relative $+12 branch_addr = PC +addr16 Direct @12 branch_addr = @addr16 X relative 12,X branch_addr =X+addr16 Indexed Y @(12,Y) branch_addr = @(Y+addr16)

In the above definition of the instruction set, and as mentioned abovein relation to FIG. 2, the bits INSTRN[3:2] of the instruction word areused to identify registers in instructions having the operand “reg”only. Also: the ‘@’ indicates “contents of”; ‘addr16’ is the 10 bitaddress value INSTRN[17:8] sign-extended to 16 bits; ‘12’ is simply anexample data value; while ‘PC’ indicates the address of the currentinstruction.

Data Architecture

FIG. 3 shows in block form the data architecture of the processor core.The operation of all elements in a given cycle is controlled inaccordance with one program instruction by a control and decode section300. The principal elements of the data architecture are the arithmeticand logic unit (ALU) 302; the A register 304 (32 bits comprising AH andAL); X register 306 and Y register 308 (16 bits each); program counter(PC) register 310 (11 bits); shift/load logic 312; a PC incrementer 314,comprising a 16 bit half adder; an address sign extender 316; an addressadder 318; various multiplexers 320 to 332; a flags and condition logic334; and a tri-state buffer 335.

Communication with the program memory is via the program address busPROG_ADDR and the instruction bus INSTRN. Communication with the datamemory (including ROM, RAM and memory mapped I/O circuits) is via thedata address output ADDR and the bi-directional data bus DATA. Alsoprovided is the serial interface register having an address part 336(bits SIF_ADDR [15:0]), an address space determining part 338, aread/write control part 340, and a data part 342 of eighteen bits. Alsoassociated with the serial interface is a tri-state buffer 346.

The operation of the processor in execution of instructions will now bedescribed.

Instructions are read from the INSTRN bus, where they have been latchedafter read-out from the program ROM 104. The lowest eight bits of thelatched instruction I[7:0] determine the operation to be performed andare passed to the control and decode section 300 which sets up the dataarchitecture multiplexers 320 to 332 and controls the sequence ofinstruction execution. The top 10 bits of the latched instructionI[17:8] specify an address or immediate value for the instruction.

As well as instruction execution, the data architecture provides thepathways for SIF operations. The architecture will now be described asit applies in detail to each class of instruction and SIF operation.

ALU Basic Functions

The ALU 302 has two inputs A and B of 16 bits each, and a 16 bit outputΣ. A carry bit input C0 is provided, and flag outputs Cn, S (sign), N(negative) and Z (zero). The basic functions implemented by the ALUunder control of the control and decode section 300 are listed withinthe block 302 in FIG. 3. The options in each cycle are: (1) to perform alogical AND of the A and B inputs; (2) to add the negative of A to Bwhile subtracting the input carry bit C0; (3) to pass B directly to Σ;(4) to subtract B from A with subtraction of the carry bit C0,outputting Cn=1 in the case of a “borrow”; (5) to output the logicalexclusive —OR (XOR) of A and B; (6) to output A+B+C0 (add with carry);and (7) to output the logical OR of A and B. The majority ofinstructions are two operand instructions, taking one operand from aregister, the other from memory or as an immediate, and returning theresult to the same register. The ALU A input receives the register inputand the ALU B input the memory/immediate value. The ALU performs therequired logical or arithmetic operation and presents the result on theΣ output from where it can be written back into the same register. Forthe AL/AH registers 304, the additional shift/load block 312 is insertedbetween Σ db the registers. This allows shift operations to be performedon the combined register pair and is also an integral part of the schemeby which multiply and divide are performed in this embodiment.

No Operand Instructions: NOP, BRK, PRINT. SIF

NOP, BRK, PRINT and SLEEP do not involve the data architecture at all.SIF executes as a SIF cycle if a SIF request is pending (described morefully below), otherwise it behaves as a no-op.

Data Address Modes

Memory/immediate values are generated and applied to the ALU B input,using the multiplexers 320, 322, 324, 328 and neighbouring components.There are four data address modes: Immediate, Direct, Indexed X andIndexed Y. Immediate takes a value directly from the instruction(I[15:0]). The other modes use the instruction value to select a valuefrom memory. The top 10 bits of the latched instruction are signextended at 316 to 16 bits. To this is added (318) either zero(Immediate or Direct), X (Indexed X) or Y (Indexed Y) as selected bymultiplexer 324. The output of adder 318 is fed to the ADDR bus viamultiplexer 326. For immediate values multiplexer 328 counts the outputof adder 318 directly to the ALU B input. For other modes, multiplexer328 selects the DATA input, containing the value read from memory.

Single Memory/Immediate Operand Instructions: LD, TST

The appropriate memory/immediate value is generated on the ALU B path asdescribed above. This is passed unchanged through the ALU 302 by settingit to ‘B’. In the case of LD, the result Σ is written to the appropriateregister. In both cases, the N and Z flag values from the ALU are loadedinto their flag bits.

Store Instruction: ST (and PRINT)

This is the only instruction that writes to memory. The register whosevalue is to be stored is selected by multiplexer 320 on the ALU A path.This is then routed via multiplexer 322 and tri-state buffer 335 to theDATA bus. The store address is formed in exactly the same way as for anormal memory/immediate operand instruction. The storing of data priordirectly, rather than via the ALU avoids the need for special timing forthis instruction.

The ST instruction sets the flags. For the valid addressing modes,multiplexer 328 is always set to 1 (DATA), so the stored value is alsopresented on the ALU B path. By selecting ALU operation ‘B’, the ALUgenerates N and Z flag values which can be loaded into their flag bits.

The opcode for ST with the Immediate addressing mode does not make senseand is used for the PRINT instruction. This has special meaning to asimulator and gate level debugger for use in developing the ASIC, but isexecuted by the processor as a no-op.

Dual Operand Instructions: ADD, ADDC, SUB, SUBC, NADD, CMP, OR, AND, XOR

These instructions operate between a register and a memory/immediatevalue, returning the result to the same register. The register value ispresented on ALU input A via multiplexer 320. The memory/immediate valueis presented on ALU B input as above.

The result Σ from the ALU is fed to the appropriate register. For theseinstructions multiplexer 330 and the shift/load unit 312 are both set sothat Σ is propagated unchanged to X, AH and AL as well as Y. Theappropriate register only is clocked.

Arithmetic instructions set all four FLAG bits out of the ALU. Logicalinstructions set only the N and Z bits.

Branch Instructions: BRA, BLT, BPL, BNE, BEO, BCC, BCS

Normally multiplexer 332 is set to 0 so that at the end of eachinstruction, PC is clocked and the program counter increments.PROG_ADDR, the address in program space, is equal to the PC value.

When a branch instruction is executed, if BRANCH_TRUE is high (output bythe condition logic 334), multiplexer 332 is switched to 0 so as to loada new PC value from ALU output. Otherwise, the PC increments, as normal.BRANCH_TRUE checks the branch condition by reference to the appropriateflag bits, using a multiplexer which is hard wired to the appropriateinstruction code bits.

The branch address (new PC value) can be specified using one of fouraddressing modes. These are similar to the data addressing modes and thesame hardware to generate them. The four modes are Direct, PC relative,X relative and Indexed Y. Multiplexer 324 is set to add the correctregister/value (inputs 0 to 3 respectively) to the sign extended valuespecified in the instruction. Multiplexer 328 selects its input 0 forimmediate modes PC relative and X relative, and selects 1 for Direct andIndexed Y.

Note that the PC value used for PC relative addressing is that of thebranch instruction itself, as the processor does not pre-increment theprogram counter. One use of the Relative X mode is to return fromsubroutines (as the return address is stored in the X register).

The new PC value is presented on the ALU B path. The ALU is set to ‘B’and the value routed through to Σ where it can be loaded into PC.

Branch to Subroutine Instruction: BSR

BSR is identical to the unconditional branch BRA, except that the returnaddress is stored in the X register (return address is the currentPC+1). This is achieved by the 1 input of the multiplexer 320 on theinput to the X register which allows the incremented PC to be loadedinto X whilst the main ALU pathway is used to transfer the new PC valueof the branch address. Multiplexer 320 is used only for thisinstruction.

Shift Instructions: SAL, SCL, SAR, SCR

The shift instructions operate on the combined AL/AH register ‘A’ andare performed entirely locally to the shift/load logic 312—the ALU isnot involved. This allows the shift operations to be performed over thefull 32-bit A register, while the ALU retains is compact, 16-bit size.Shifts are performed by successive one bit shifting in this embodimentrequiring simply 32 three-way multiplexers, to give left shift, no shiftor right shift per bit. Six different operations are possible, theappropriate one being selected by the control unit 300 in accordancewith the current instruction: (1,2) load AH or AL directly with noshift; (3) shift AH and AL left; (4) load AH while shifting left; (5)shift AH and AL right; and (6) load AH while shifting right.

The number of bit positions to shift is specified as a memory/immediatevalue to the instruction. This is read off from the ALU B path by thecontrol architecture and used to generate the appropriate number ofcycles. Each cycle then shifts by one bit left or right.

In the conditioning logic 334, the SHIFT_IN value is selected to allowthe carry flag C to be shifted in for SCL/SCR shifts, zero to be shiftedin for SAL, and the current sign bit (AH bit 15) to be shifted in(extended) for SAR.SHIFT_OUT, the bit shifted out of A is loaded intothe carry flag C with each bit shifted.

Multiply Instruction: MULT

Multiply is performed by repeated shift and add and takes 16 cycles. Thealgorithm uses a combination of the ALU 302 and the shift/load logic312, shown schematically in FIG. 4A for one cycle. The shift/load block312 is multiplexer controlled by the instruction decoder and, duringMULT, by the current lowest bit A[0] in the A register. One of theoperands is the initial value of AL, the other is a memory/immediatevalue presented on ALU input B. The result is produced in AH/AL. Theinitial value of AH acts as an addend to the result, and AH shouldnormally be cleared before the operation. As shown by the asterisk (*),the ALU 302 executes an addition for the first 15 cycles, but asubtraction in the last cycle.

In pseudo code, the algorithm is as follows. A is the concatenation ofAH and AL. {} indicates bit concatenation.

Repeat 16 times If last time round loop then op = SUBTRACT else op = ADDIf A[0] == 1 then A[31:15] := A[31:15] op ALU_B --add (*) ALU_B into top16 bits A[31:0] := {ALU_S,A[31:1]} -- shift right else A[31:0] :={A[31],A[31:1]} --sign extend shift right end if end

The ALU calculates A[31:15] “op” ALU_B on each cycle (“op” is either“plus” or “minus”, set by the control architecture according to thecycle). A[0] is fed to the control architecture. This then sets theshift/load logic 312 accordingly, either to sign extend shift right(when A[0]=0) or to load AH and shift right shifting in ALU sign flag S(A[0]==1).

Divide Instruction: DIV

Divide is performed by repeated subtraction and takes 16 cycles. Again,the algorithm uses a combination of the ALU 302 and the shift/load logic312, as shown schematically in FIG. 4B. The dividend is the initialvalue of AH,AL, the divisor is a memory/immediate value presented on ALUinput B and the result and remainder are generated in AH,AL. Thepseudo-code for DIV is:

Repeat 16 times ALU_RESULT:= A[31:15] - ALU_B If ALU_CN == 1 thenA[31:0] := {A[30:0],0} --shift left, ←0 else A[31:15] := ALU_RESULT--subtract ALU_B from top 16 bits A[31:15] := {A[30:0],1} --shift left,←1 endif end

The ALU calculates A[31:15] minus ALU_B on each cycle, and according tothe value of ALU_CN, the control architecture sets the shift/load logic312 either to shift left (when ALU_CN=0) or to load AH and shift left(when ALU_CN=1).

SIF Cycles

There are four cases of SIF cycles: memory read, memory write, registerread and register write. These will be described more generally later,with reference to the generalised embodiment if FIG. 7. Briefly, andwith reference to FIG. 3, SIF memory read cycles set multiplexer 326 tothe SIF address and load the result directly and in parallel from theDATA bus into the SIF DATA register 342.

SIF memory write cycles set multiplexer 326 to the SIF address and writedirectly from the SIF_DATA register 342 to the DATA bus via theSIF_WRITE tri-state buffer 346.

SIF register read cycles use the DATA bus as an intermediate. Theregister to be read is selected using multiplexers 320 and 322. This canbe AH, AL, X, Y, PC, flags or the current instruction. This value isenabled onto the DATA bus via tri-state buffer 335 and from there to theSIF_DATA register 342 as if it were a memory read. Note that the top 2bits of the current instruction are fed directly to the SIF_DATAregister and will be loaded for any SIF read operation. These bits areonly of interest when reading from the current instruction, however.Since each instruction word is eighteen bits wide, the DATA bus alonecan only be used to carry the lowest sixteen bits.

SIF register write cycles again use the DATA bus as an intermediate. Thevalue to be written is gated onto DATA via SIF_WRITE tri-state buffer346 and from there through the ALU on the B path. ALU output Σ can thenbe loaded into the appropriate register.

Pin Description

The following provides a complete and detailed description of thefunction of each of the “pins” of the processor core. As shown in thebasic arrangement of FIG. 1, some of these pins are connected tophysical pins of the integrated circuit (ASIC), while most are connectedonly internally of the integrated circuit. In the following descriptionof pin functions, the pin name and a pin type input, output orbi-directional (tri-state) is presented. The functions and operations ofthe pin are then described.

FIGS. 5A to 5G of the drawings are presented to illustrate the waveformspresent on the pins, as described below.

RST (Input)

Asynchronous reset. Resets processor to known state.

Registers and flags 0 Sleep state Awake Run/Stop state Run SIF cycleNone pending

On releasing RST the processor will start executing code from address 0.The internal reset signal is held active until a falling clock edgeafter RST is released to ensure a clean restart.

PCLK(Input)

Processor clock. As shown in FIG. 5A, each processor cycle requires 4PCLK clocks. Both edges of the clock are used. Cycles always start on arising clock edge and comprise this and the following 7 clock edges. Theedges are referred to as clock 0 through clock 7 and the parts of thecycle are numbered according to the clock they follow. PCLK may bestopped and restarted at will to switch the processor on and off. Notethat output signals will freeze in whatever state they were in and thatthe processor will not be sensitive to any input signals except RSTwhilst the clock is stopped.

The 4 clock sequencer is disabled when the processor is idle. The phaseof cycles is therefore not fixed from reset for all time. Most processorinstructions execute in a single cycle.

WAKE UP (Input)

Wakes the processor from sleep. As shown in FIG. 5B, when the processorexecutes the SLEEP instruction it goes into a sleep mode where activityis minimised and the power consumption is very low. WAKE_UP is thensampled on the rising edge of every PCLK and the processor wakes up andrestarts with the instruction following SLEEP when a 1 is detected. Interms of the cycle, the PCLK rising edge where the 1 is detected isclock 0. Note that if WAKE_UP is held at 1, SLEEP behaves just like aNOP.

STOPB OUT (Output), STOPB IN (Input), RUN STEP (Input)

Processor start/stop, breakpoint and single step debug facility. This isintended only for debugging and is not for use by the application. Thesesignals must be brought out to outside pins STOPB and RUN_STEP via padsas shown. The description that follows describes the pin levelbehaviour.

The processor is either running or stopped. This is a top level stateabove whether it is asleep or awake. STOPB is open-drain with aninternal pullup and is normally high.

As shown in FIG. 5C, when the processor executes the BRK instruction, itdrives STOPB low and stops. This indicates to the outside world that theprocessor has hit a breakpoint. The processor can be manually stopped bypulling STOPB low externally during a rising PCLK edge. The processorimmediately takes over holding STOPB low and will stop at the end of thecurrent instruction.

RUN_STEP is an input with a built-in pullup. As shown in FIG. 5D, whenhigh, it forces the processor into the run state. RUN_STEP over-ridesSTOPB, so that if it is held high, the processor will run continuallyignoring breakpoints and stop requests. In normal use, the pin should beallowed to pull itself into this condition. For debug, RUN_STEP shouldbe normally low. Breakpoints are then enabled. Note that this means thatstrategic breakpoints can be left in the final code and enabled bycontrol of RUN_STEP. This is a powerful tool for test and verificationpurposes. RUN_STEP is then taken high for one clock to restart theprocessor.

As shown in FIG. 5E, single stepping requires control of both STOPB &RUN_STEP (illustrated for single cycle instruction):

(1) Take RUN_STEP high (2) Wait until STOPB rises (3) Take RUN_STEP lowand drive STOPB low. (4) Wait ≧ 1 clock (5) Release STOPB

PROG ADDR (Output), PROG CLK (Output), INSTRN (Input)

Program space memory interface. Unclocked or clocked ROMs can be used.PC is the instruction address (16 bits). As shown in the FIG. 5Fwaveforms PROG_CLK is the clock for clocked ROM (active rising edge) andINSTRN is the instruction data (18 bits).

For multi-cycle instructions, PROG_CLK rises on the first cycle, stayshigh from intermediate cycles and falls on the last cycle. PC changesonly on the last cycle.

ADDR, R WB (Output), LIM WRITEB (Output), DATA CLK (Output), DATA(Bidirectional)

Data space memory interface. Unclocked or clocked memory can be used.ADDR is the address (16 bits). R_WB is the read/write select line,LIM_WRITEB allows protection of parts of the memory space from SIFwrites, DATA_CLK is a cycle strobe which also acts as a clock forclocked memory and DATA is the bidirectional data bus (16 bits). See thewaveforms of the FIG. 5G.

If an instruction does not require a memory cycle, DATA_CLX will staylow. For multi-cycle instructions, DATA_CLX will go high on the firstcycle, stay high for intermediate cycles and go low on the last cycle.ADDR etc will stay valid from the first cycle to the last cycle.

LIM_WRITEB is normally high. It goes low when a write to memory isrequested from the SIF in normal operation (as opposed to in debug withthe processor stopped). Including this signal in an address decodeprevents user writes to the decoded device from the SIF. This isnecessary for protecting some applications against accidental damage.Other devices may be written from the SIF, allowing for instancecommunication between the ASIC and a general purpose microcontroller.

TEST OUT (Output)

IC test output. An XOR of the ALU output bus which can be used to givevisibility of processor operation for IC test. It is either brought outas a specific pin or included in an XOR tree which is externallyvisible.

SER IN (Input), SER OUT (Output), SER CLK (Input), SER LOADB OUT(Output), SER LOADB IN (Input)

Serial interface (SIF). These signals are brought to outside pinsSER_IN, SER_OUT, SER_CLK and SER_LOADB via pads as shown. Thedescription that follows describes the pin level behaviour.

The SIF provides a method for transferring data serially into and out-ofa device, by means of a shift register in the device. The presentprocessor uses a SIF shift register length of 36 bits. The data input tothe shift register is SER_IN and the data output SER_OUT. SER_CLK is theclock. Data is clocked into the shift register on the negative edge, andout on the positive edge. SER_CLK is completely asynchronous to theprocessor clock and can be faster or slower than it if required. Data isclocked in and out MSB first. SER_LOADB is used to co-ordinate transfersvia the shift register.

The serial interface allows (1) the program space to be read, (2) thedata space to be read and written and (3) the processor registers to beread and written. In normal running, SIF transfers are only carried outwhen a SIF instruction is executed. When the processor is asleep orstopped, SIF transfers are carried out immediately, starting the cycleengine for one cycle to perform the transfer. In normal running, theregisters and limited data space areas cannot be written. In stoppedstate the registers and the full data space be written to.

SER_LOADE is an open-drain output with an internal pullup. To perform acycle, the shift register is loaded and SER_LOADB pulled low for ≧1clock. The process detects the transfer request and holds SER_LOADB lowuntil the cycle has been completed. This handshake system is robust andplaces no timing constraints on either side of the interface. SER_CLKmust be low when SER_LOADB is low to allow the shift register to beloaded by the processor.

FIG. 6 shows the SIF shift register arrangement. The operation of theSIF will be described in more detail below. The SLEEP bit is read only.Undefined bits read as 0.

Program space read operations, read from the PC address and incrementPC. This allows sequential reads without specifically setting up anaddress. The definition allows just ‘1’s to be shifted into SER_IN asthe data is read out to repeat.

Notes

Literals which are outside the range −512 . . . +511 can beautomatically implemented as direct by the assembler with the valuestored in data ROM. From the user point of view it will seem that fullrange literals are available.

Branch targets which are outside the PC relative range can beautomatically implemented as direct by the assembler with the offsetstored in data ROM. From the user point of view it will seem thatbranches to anywhere in address space are available. When executing a PCrelative branch, PC points to the branch instruction itself.

Subroutine calls use the X register for the return address:

BSR fred ; Branch to subroutine with BSR fred . . . ; Subroutine : BRA0,X ; Return with BRA

If the subroutine itself calls a subroutine or otherwise uses X, it mustbe saved in memory and restored before exit. Conditional returns can beimplemented by using BCC instead of BRA. Multiple exit points can beimplemented by returning to 0,X or 1,X etc.

Addition and subtraction operations work on signed or unsigned, integeror fractional values. ADD, ADDC and NADD treat C as a carry. SUB, SUBCand CMP treat C as a borrow. ADDC and SUBC facilitate straightforwardmulti-precision arithmetic. The S flag applies to operations on signedvalues and gives the true sign of the result, independent of overflowinto the MSB. It is calculated as N{circle around (0)}V and isespecially useful for CMP where it indicates signed “less than” (N onlyindicates this over a limited operand range).

MULTiply is signed. Integer and fractional multiplies are implemented asfollows (“&” is the assembler macro parameter substitution operator):

IMULT data ; Multiply integers AL * data. Result in A LD AH,#0 MULT&data FMULT data ; Multiply fractionals AL * data. Result in A LD AH,#0MULT &data SAL #1

Integer multiply never overflows, even if AH is non-zero to add to theresult. Fractional multiply only overflows when multiplying −1.0 by−1.0. AH adds to the result as a signed fractional divided by 2¹⁵, andcannot precipitate overflow.

Unsigned integer multiply is implemented by a macro:

UIMULT data, temp ; Multiply unsigned integers AL ; and data. Result inA ; temp is a workspace RAM ; location that must be supplied LD AH,#zeroST AL,@&temp MULT &data TST &data BPL $+2 ADD AH,@&temp TST @&temp BPL$+2 ADD AH,&data

However, if the code of UIMULT is used without zeroing AH it does notprovide an unsigned multiply/accumulate, as AH is added as a signedvalue (sign extended).

DIVide is signed, positive only (MSB of A and data must be zero).Positive integer and fractional divides are implemented by macros in theassembler as follows:

PIDIV data ; Divide +ve integers A ÷ data. ; Result in AL, remainder in; AH[15:1], AH[0]=0 SAL #1 DIV &data PFDIV data ; Divide +ve fractionalsA ÷ data. ; Result in AL, rounding bit ; AH[15] DIV #data Note that A isa long integer or a double precision fractional in the above. IDIV andFDIV will overflow if AH ≧ data. No indication of overflow is provided.

To implement a full signed divide, the operands must be made positivebefore the divide and the result corrected afterwards.

Shifts can be from 0 to 15 bits. A 0 bit shift leaves A unchanged andsets C as if a 1 bit shift had been performed. This can be used to getthe top or bottom bit of A into C.

Some other instructions common on other processors are implemented asmacros:

CLC ; Clear carry bit. Note affects S,N,Z ADD AL,#0 SEC ; Set carry bit.Note affects S,N,Z NADD AL,#-1 XOR AL,#H′FFFF NEG reg ; Negate reg. C isset as carry. NEG ; instrn often sets as borrow NADD & reg,#0 NEGA ;Negate 32 bit A register. Note C set ; as carry XOR AL,#H′FFFF NADDAL,#0 ADDC AH,#0 RTS ; Return from subroutine BRA 0,X

Branch instructions for the full range of signed and unsignedcomparisons are implemented as macros:

BHI branch_addr ; Branch higher (unsigned) BCS &skip BNE &branch_addr&skip: BHS branch_addr ; Branch higher or same (unsigned) BCC&branch_addr BLO branch_addr ; Branch lower (unsigned) BCS &branch_addrBLS branch_addr ; Branch lower or same (unsigned) BCS &branch_addr BEQ&branch_addr BGT branch_addr ; Branch less than (signed) BLT &skip BNE&branch_addr &skip: BGE branch_addr ; Branch greater or equal (signed)BLT &skip BRA &branch_addr &skip: BLE branch_addr ; Branch less than orequal BLT &branch_addr BEQ &branch_addr

The primitives BLT, BEQ and BNE complete the set.

It is best to avoid using manual PC relative branch (e.g. BRA $+1).These will not work correctly in the multiple instruction macros aboveand in general make assumptions about instruction lengths which cancause problems when branching over macros.

BRK, if it causes the processor to stop, leaves PC pointing at the BRKinstruction. If it is over-ridden by the external signal RUN_STEP, BRKbehaves as a NOP.

PRINT is a debugging aid. It is a NOP as far as the processor isconcerned but is detected by a gate level model debugger and a simulatorand, if enabled, will print out the instruction's immediate value andspecified register. The immediate value can be used to indicate theposition in the program. The register value may or may not indicate someuseful result.

Multi-word variables can be stored in memory MS word first, as aconvention. There is nothing in the instruction set to define an order.The following code fragment shows a multi-precision subtract:

LD AL,@(var1+1) ; Subtract var1 - var2 SUB AL,@(var2+1) LD AH,@var1 ;Result in A. C,S,N correct for ; 32 bit SUBC AH,@var2 ; result. z set onMS word only

The Y register can be used as a software controlled stack pointer. Themnemonic “SP” is recognised in place of Y by the assembler. When used asa stack, Y points to the lowest used memory word in the stack. Thefollowing illustrates the use of Y for storing the return address andlocal variables for a subroutine:

fred LABEL fred_s STRUC ; Define local vars in struct var1 DS 1 var2 DS1 fred_s ENDS ST X,@ (−1,SP) ; Store return address SUB Y,# (LENGTHfred_s ; Allocate space +1) . . . . . . @ (*var1,SP) ; Local variableaddress . . . ADD Y,#LENGTH fred_s ; Return space + 1) BRA @ (−1,SP) ;Return directly from stack Note the convention of storing the returnaddress first on the stack.

Construction and Operation of the Serial Interface (SIF)

As described above with reference to FIGS. 1 to 6, the serial interface(SIF) operates to allow external access to the memory spaces andregisters of the processor via the external pins of the integratedcircuit. FIG. 7 shows one practical implementation. The processor inFIG. 7 may be the same as that of FIGS. 1 to 6, but could equally be ofan entirely different design.

The main part of the processor is illustrated at 500, and is shownschematically connected to the program memory 104, and the data memoryand I/O—106, 108, 110. Within the main part of the processor core, aninstruction latch 502 receives program instruction words from theprogram memory 104, which are decoded by a control section 504 of theprocessor. The registers of the processor are shown at 506. Thearithmetic and logic unit (ALU) and other functional units of theprocessor core are grouped schematically in a block 508. A data bus ofthe processor is shown at 510. FIG. 7 does not show in any detail thedata paths and control lines between the various elements 502 to 510,which can readily be implemented using conventional techniques, forexample to implement the functionality detailed above in relation to theinstruction set, data architecture and pin description.

The serial interface shift register is physically embodied at 512 andfeatures the data field, address field, read/write field and addressspace field which are as shown in FIG. 6. In response to the serialinterface clock signal SER_CLK, one bit at a time can be shifted fromthe shift register 512 out of the chip (SER_OUT) and/or into the chip(SER_IN). An interface control block 514 is shown associated with theshift register 512 and the fourth control line of the interface(SER_LOADB), while in practice the control units 504 and 514 of theprocessor may be implemented as a single functional unit, or furthersub-divided as desired.

The address and read/write signals applied to the data memory 106, 108,110 are primarily supplied (ADDR', R WB') by the normal functionalelements 504, 508 of the processor core 500, for example as described inthe Pin Description above. However, a multiplexer 516 is provided inthese address and control lines, so that the address applied to thememory can instead be that which is contained within the interfaceregister 512 (SIF ADDR). Similarly, the read/write control bit for thedata memory can be derived from the relevant bit of the shift register512 (SIF_R_WB) when the multiplexer 516 is controlled appropriately. Inparticular, a control line SIF_CYC is driven by the control andinstruction decoding section 504 of the processor, so that themultiplexer is activated in this way during the cycle of execution ofthe special SIF instruction, assuming that the bit SIF_A_SPACE indicatesthat the data address space is to be accessed in a given SIF operation.

Similarly, a multiplexer 518 and tri-state buffer 520 are driven tocause the data bits of the shift register 512 to be written from orwritten to the data bus 510, during a SIF write or SIF read operation,respectively. The main part of the processor core 500 ignores the datapresent on the data bus 510 during the execution cycle of a SIFinstruction.

For the case where access to the registers or program memory space ofthe processor is desired through the serial interface SIF, as indicatedby the bit SIF_A_SPACE of the word loaded into the shift register 512,the multiplexer 518, a tri-state buffer 522 and a bi-directionalselection circuit 524 provide access between the data bits of the shiftregister 512 and the internal registers 502, 506 of the processor core500. The selection circuit 524 is controlled by the lower 4 bitsSIF_ADDR[3:0] of the address field within the shift register 512, asdetailed in FIG. 6. Therefore, during a SIF instruction execution cycle,any of the registers PC, F (flags), AH, AL, X or Y can be read orwritten, or the currently addressed location of the program memory 104can be read via the instruction latch 502. As described already, theprogram counter value stored in register PC can be incrementedautomatically to allow sequential access to a range of locations in theprogram memory 104, when the debug mode is activated. In practice, theselection circuit 524 and multiplexer 518 may readily be combined withexisting data path selection components of the processor core 500, toachieve a very compact circuit.

The particular mode of handshaking between the chip (ASIC) and theexternal apparatus is explained in the Pin Description above, but willnow be further illustrated with reference to FIGS. 8 and 9. In the lefthand flow chart of FIG. 8, beginning at 600, the actions of the externalapparatus wishing to write a data value into a location within thestorage space of the integrated circuit begin at 600. At 602, while thefourth wire SER_LOADB remains high (passive state), the data line SER_INand the serial clock line SER_CLK are used to load the various bits ofthe shift register 512 with the ASIC. A matching shift register willtypically be provided within the external apparatus for this purpose.

When the shift register on chip contains a complete SIF instructionword, the external apparatus (step 604) sets the fourth wire SER_LOADBto an active value (0) for a long enough time (one clock cycle or more)that the SIF control circuit 514 within the ASIC can recognise that aninterface operation is desired. The external apparatus then releases thefourth wire at step 606, and then (step 608) watches the fourth wire tosee whether it returns to the passive state (1) or whether it is beingheld active (0) by the ASIC. Only when the fourth wire returns to thepassive state is the SIF write operation considered complete (610).

Referring now to FIG. 9, the operations within the ASIC in relation tothe serial interface begin at 700. At 702, the fourth wire SER_LOADB ofthe interface is monitored by the SIF control unit 514 of the ASIC,until it is seen to enter the active state. At 704, the ASIC immediatelyactively holds the fourth wire in the active state. At 706, the commandwhich has been loaded into the register 512 of the interface isprocessed during the next available SIF instruction cycle. When a writeinstruction is indicated by the bit SIF_R_WB in the register, thiscauses the DATA field to be written into the storage location of theASIC determined by the fields SIF_A_SPACE and SIF_ADDR. In the case of aread operation, the DATA bits of the register 512 are loaded with dataread from that location. Only after the SIF operation has been processedwithin the ASIC does the control circuit 514 release the fourth wire ofthe interface (step 708). This is to inform the external apparatus thatthe write operation is completed, or that the information to be read canbe clocked out of the serial interface now.

It is not guaranteed, however, that the external apparatus will alreadyhave released the fourth wire (step 606), and a loop is implemented atstep 710 to monitor the state of the fourth wire. Only when this wire isseen to go high again does control return to step 702. In this way,repeated SIF instructions are not implemented merely because theexternal apparatus is very slow to release the fourth wire.

Returning to FIG. 8, the right hand flow chart beginning at step 620illustrates the SIF read operation. Steps 622 to 628 are the same ascorresponding steps 602 to 608 of the SIF write operation, except thatat step 622, no data needs to be clocked into the shift register 512 ofthe ASIC, saving time. Also, of course, the bit SIF_R WB is set toindicate read instead of write. After it has been detected at step 628that both the external apparatus and the ASIC have released the fourthwire (SER LOADB=1) the data read from the desired address or registerwithin the ASIC is present in the DATA bits of the shift register 512,and is clocked out of the shift register into the external apparatus instep 630.

Double broken lines in FIG. 6 connect step 630 with step 602 or 622 toillustrate that, if desired, successive SIF operations can be pipelinedby performing the step 630 of reading the shift register 512simultaneously with the step 602 or 622 of loading the register 512,using the data input and output wires SER_IN and SER_OUT in parallel.

Power Saving Features

In a static CMOS or similar circuit implementation, power dissipationarises only during signal transitions. In the processor described aboveoverall power dissipation is greatly reduced, even compared withconventional “low-power” designs, by interposing latches at variousstages within long combinational logic data paths including for examplethe arithmetic and logic unit (ALU). Within each instruction cyclesubdivided into a number of ‘clock states’ S0-S7, the latches areclocked only at a given state of the clock within the cycle, when alldata and control lines which form inputs to the relevant stage havesettled to a meaningful value. Transient states arising at the inputs ofeach stage before that time therefore cannot cause superfluoustransitions to propagate far within the combinational circuitry of theprocessor throughout the instruction cycle.

In the example processor of FIGS. 1 to 6, for example, such latches areprovided for examples on the current instruction (INSTRN), to isolatethe decoder from ripple on the instruction bus. There are also latcheson the

ALU inputs A, B and C0, and also on the control lines from theinstruction decoder 300. In addition to power saving in this way, theseand further latches may be used to latch external inputs and to protectagainst “clock skew” at various points.

Referring again to FIG. 3, it is also a power saving feature of thepresent processor design that most parts of the data architecture havededicated data buses, rather than a shared data bus. The DATA bus whichleaves the processor core and which is also connected to the SIFregister is an exception, in that it is driven by logic with tri-stateoutput buffers 345, 346 etc. However, as described above, the output Σof the ALU 302 is connected by dedicated pathways to the registers 304to 310, and the outputs of these registers are similarly connected tothe inputs of the ALU by dedicated pathways and multiplexers 320 to 328.

Compared with conventional processor designs in which all such elementsare interconnected by means of shared data buses, the present processordesign has eliminated many tri-state buffers that would otherwise berequired at the outputs of the ALU and registers to drive a common databus. Also, each dedicated data path has a lower capacitance than theconventional shared bus, with the end result that the power consumptionof the processor core is lower.

FIG. 10 shows schematically circuitry implemented for monitoring wake upsignals during the SLEEP or STOPPED states of the processor. In thesestates, the main functional elements of the processor are not clocked bythe usual high frequency clock signal (PCLK in this embodiment), toreduce power dissipation in those circuits. On the other hand, externalinputs must be monitored to cause the clock signal to be applied oncemore to these parts. In the present embodiment, signals such asSTOPB_IN, RUN_STEP and WAKE_UP must be monitored, and also theSER_LOADB_IN. Any of these may require the processor to perform at leastone cycle of operation. In an alternative processor embodiment, externalsignals indicating interrupt requests may typically need similarmonitoring.

FIG. 10 shows the main high frequency clock signal PCLK input to theprocessor core. This is gated at 800 with a clock enable signal CENFF togenerate an internal clock signal PCLKI. The main part of the processorcircuitry is driven by PCLKI, so that when CENFF is low, the mainprocessor circuitry is not running and power dissipation is reduced. Atrigger circuit 812, however, is perpetually clocked by PCLK. An inputon the trigger circuit 812 receives an asynchronous clock enable signalCEN and, at its output, generates the clock enable signal CENFF which isapplied to the gate 800.

Different external signal lines EXT1, EXT2 etc. are to be monitored, andeach has its own flip flop 808, 810 etc. and XOR gate 804, 806 etc. forthis purpose. The XOR gates 804, 806 produce individual clock enablesignals CEN1, CEN2, from which an OR gate 902 generates the combinedclock enable signal CEN.

In operation, flip flop 808 has a D input monitoring the external signalEXT1, but is clocked only by the internal clock signal PCLKI (output ofgate 800). Therefore, when the processor is stopped or sleeping, the Qoutput of flip flop 808 carries a signal EXTLFF which is a state of theEXT1 signal, memorised from the last time that the processor wasrunning. At the same time, there is no power dissipation in the flipflop 808. The XOR gate 804 compares the actual input signal EXT1 withEXT1FF, and generates the individual wake up signal CEN=1 as soon asthere is any change in the external signal EXT1 relative to EXT1FF. Bythe operation of OR gate 802, CEN goes high, and the trigger circuit 812sets CENFF high, enabling the internal clock PCLKI for the entireprocessor. At this point, the present state of signal EXT1 becomeslatched also as EXT1FF, so that the CEN1 signal itself disappears.

A similar operation is provided in relation to input EXT2 by the XORgate 806 and the flip flop 810. Any number of such inputs can beprovided, with the same or different monitoring circuitry.

Compared with a circuit in which, for example, each individual flip flop808, 810 is clocked by the running clock PCLK to generate a synchronousclock enabled signal, the present arrangement Achieves a reduced powerdissipation in the sleeping or stopped state. Only the clock input of asingle flip flop within the trigger circuit 812, in addition to theinput of the gate 800, need to be continuously supplied with the runningclock PCLK, no matter how many inputs are being monitored. To enter thesleep or stopped state, a clock disable signal CDIS from elsewhere inthe control logic of the processor is applied to the trigger circuit812, which sets CENFF low and so disables the internal clock PCLKI.

Application Example—Domestic Gas Meter

WO-A1-95/04258 describes an ultrasonic domestic gas meter apparatus,which measures gas velocity by a “time of flight” principle. To besuccessful as a domestic gas meter, such a product must be designed fora world market and have different versions conforming to severalnational standards. It should be powered by batteries which will lastmore than ten years, even in a wide range of hot, cold, dry and dampenvironments.

The principles of operation of the ultrasonic gas flow meter, and themeasurement algorithms to be implemented are described in more detail inWO-A1-95/04258. Briefly, two ultrasonic transducers of the gas flowmeter are used both to transmit and receive ultrasonic pulses. Onetransducer is installed upstream of the gas flow in the meter, the otherdownstream. In operation, the upstream transducer sensor sends anultrasound pulse to the downstream transducer, and the time of flightfor this pulse is measured with very high precision. Then the downstreamtransducer sends the pulse to the upstream transducer, and the time offlight is again measured. If the gas velocity is zero, both readingswill be the same. However, as the gas flow increases, the time of flightdownstream will become shorter than the time of flight upstream. Fromthis information it is possible to calculate the gas velocity but onlywith sophisticated signal processing. Such readings are taken every fewseconds, and the accumulated volume of gas which has flowed is updated.

The signal processing electronics of the gas meter are thereforerequired to drive and receive signals from an ultrasonic transducer, tomake very accurate time and voltage measurements, to performsophisticated signal processing and yet be flexible enough to pass therequirements of several national standards. At the same time, thecircuitry should run at a very low voltage, have a very low averagecurrent consumption, have a very good long term reliability and very lowunit cost when manufactured in high volume.

Nowadays an ASIC (custom IC chip) solution might naturally be adopted toimplement a specialised instrument of this type, and CMOS ASIC processesin particular are known to provide many advantages such as low cost andlow power consumption. The use of this technology enables thedevelopment for example of custom analog cells to drive and receivesignals from the ultrasonic transducers.

However, the extreme low power consumption desired in the this exampleproduct can only be obtained when the program controlled processor whichimplements the signal processing and control functions is alsointegrated, with its program store, on the same chip. Also, having theprocessor on-chip would reduce interference emissions and reducesusceptibility to interference from outside. Aside from the obviousbenefits of compliance with EMC regulations with less shielding, higheremissions within the product would impair the accuracy of themeasurement electronics.

Unfortunately, as mentioned in the introductory part of thisapplication, conventional processor designs tend to be too expensive interms of chip area, and/or are not powerful enough per instruction forarithmetic-intensive applications. Large circuit size and high processorclock speed will only increase the power consumption. The problem ofverification of the design also arises when all components and thecontrol program are fixed in the chip hardware, and when the programimplements real-time operations. It will not normally be possible withconventional designs to be sufficiently confident in the design andprogramming to commit both to hardware. Finally, when the design iscompleted, it must contain explicit provision for any productmodifications that may be necessary in the future.

The novel processor design described above largely overcomes thesedrawbacks, while the processor and serial interface can be provided asone or two cells in an ASIC design “library”, to be incorporated in awide range of designs.

FIG. 11 shows schematically the arrangement of the electronics for sucha gas meter in which a CMOS ASIC 1100 includes a data processor of thetype shown in FIG. 1. The ASIC includes processor core 100, programmemory 104, data ROM 106 and data RAM 108. ASIC 1100 also includesspecialised digital circuitry 1102 for signal processing and controlfunctions and also specialised analog circuitry 1104, includingtransducer drivers, switches and digital-to-analog converters. On thesame printed circuit board 1105 are connections for ultrasonictransducers 1106 and 1108, a 3 volt Lithium battery 1110, a small liquidcrystal display 1112, a conventional low cost microprocessor chip 1114and some electrically erasable, programmable non-volatile memory(EEPROM) 1116. These last two components communicate with ASIC 1100 viathe serial interface (SIF) 1118, which also provides a test port 1120for the gas meter electronics.

In performing signal processing, the powerful 16 bit arithmeticfunctions with the 32 bit accumulator register allow the algorithms tobe implemented with a minimum of power consumption. Also, the digitalcircuitry 1102 provides integrated hardware for the high speed andrepetitive processing of data from the analog circuitry 1104.

Reliable running at very low voltage, for example outside in very coldweather at the end of the life of the Lithium battery 1110, is morereadily achieved with an integrated solution, since the entire ASICdesign can be characterised for low voltages which would not be possibleusing various standard components. Average power consumption is reducedgreatly by giving the ASIC 1100 control over the power supplies to thecomponents 1112 to 1116. The processor core consumes very low power whenit is in the sleeping state, and zero power when the clock signal PCLKis stopped. Simple timing circuitry on the ASIC can be provided to startthe processor clock only intermittently, when measurements are requiredto be taken, and the processor 100 when running can take further controlof the other circuitry of the ASIC 1100 and the printed circuit board1105, activating only those circuits which are required at a given time.The power consumption of the processor is very low even when it isrunning, due to various features mentioned above. Furthermore, becausethe arithmetic instructions are powerful for the size of the processor,the processor does not need to execute so many instructions for a givencalculation.

To maintain economies of scale while providing a domestic gas meter thatcan meet different national standards and allow several productvariants, the ASIC 1100 (including its stored program) should be thesame for all such variants. This would not be achieved readily inconventional microprocessor architecture, but in the present example thedifference between the products is implemented in the low cost externalmicroprocessor 1114 and the EEPROM 1118. This can be changed(re-programmed) at a much later stage in product development, as eachnew requirement comes to light, or even after installation in the field.The serial interface 1118 enables the external microprocessor 1114 tohave direct read and write access to the memory space in the ASIC, sothat parameters of the measuring process can be changed as necessary. Itis an advantage of the serial interface design presented herein that, atthe time of fixing the ASIC design and programming, it need not bedetermined to which memory locations the external microprocessor willneed to have access. Therefore, there is great flexibility forunforeseen modifications in the product development.

The novel processor and interface architectures allow “verifiable”design and programming, so that the risk of errors in the final ASIC canbe reduced, despite the integration of processor and program ROMon-chip. Flexibility of the design for future product requirements isopen, by means of the serial interface.

With regard to cost, the processor core 100 can be very small (only 3000gates), requiring only about 1 mm² of silicon. The unit cost of such asilicon area, in volume manufacture, is about £0.15 at the presentfiling date, while it is not possible to buy a standard external 16 bitmicroprocessor for such a low price. Furthermore, by implementing somany components on the ASIC, the product has very few components intotal, giving very low PCB, assembly and test costs. Reliability is alsoenhanced, because failures in electronics tend to happen because ofbroken or poor connections, and the number of connections in the designof FIG. 11 is very low.

Further Notes

Those skilled in the art will recognise that the detailed implementationof a microprocessor or other circuit embodying any aspect of thisinvention need not be limited to the examples given above. Of course thedetails of the instruction set can be changed to suit a givenapplication, the widths of address and data busses, and the widths ofvarious fields in the instruction word of the processor can be changedalso. Even at a more general level, the scope of the present inventionencompasses many individual functional features and manysub-combinations of those functional features, in addition to thecomplete combination of features provided in the specific embodiment.Whether a given functional feature or sub-combination is applicable in aprocessor having a different architecture, for example a processor withpipelined instruction decoding and execution, will be readily determinedby the person skilled in the art, who will also be able to determine theadaptations or constraints imposed by the changed architecture.

It will also be appreciated, that, whereas the program instructions andinitial data for the processor operation are permanently fixed in ROMstorage on-chip, embodiments are perfectly feasible for prototypingand/or final production in which the ROM is replaced by E²PROM(electrically erasable programmable read only memory) orone-time-programmable ROM, where the processes used for manufacture (andthe costs) will permit.

All or part of the program store may in some cases need to be off-chip.If the pin count associated with the architecture is too high, it may bereduced for example by providing an 8-bit program ROM, and performingmultiple accesses to build up each instruction word.

Concerning the novel architecture for arithmetic operations, theprovision of the 32-bit shift unit separately from the 16-bit ALU allowsa combination of high signal processing performance and small circuitsize. A conventional processor having shift functions in the ALU wouldtypically provide a 32-bit wide ALU or forego the 16×16 multiply anddivide instructions. Of course, the ALU of the present processor couldbe provided with 16-bit shift functions if desired.

Also, although a single bit shift function has been provided in theabove embodiment, requiring n cycles for an n-bit shift, there is alsothe possibility to include a “barrel shifter” to allow n-bit shifts in asingle cycle. The choice of 1-bit or n-bit shift circuitry is a tradeoff between desired processing speed and circuit size.

Concerning the serial interface (SIP) described above, it will beapparent that similar functions can be performed by other types ofserial or parallel interface, and it will also be apparent that the SIFconnections and protocols can be applied for communication betweendifferent apparatuses or devices which need not be ASICs or in any waysimilar to the above described processor.

Other standard interfaces well known in the art are the I²C 2-wireinterface of Philips Electronics NV, the SPI interface of MotorolaCorp., and the SCI interface of Hitachi. The SIF described above canreadily be interfaced, for example, to a Motorola processor (for exampleMC68HC11) having an SPI type interface. Another interface known in theart is the “Microwire” interface of National Semiconductor Corp, withparticular application to E²PROM components.

The SIF described above has the advantage of a very simple and robuststructure and protocol, with a single master, and can be used in many ofthe same applications as the known interfaces mentioned above.

Although the Microwire interface defines a simple fixed master and slaverelationship, the known interfaces like I²C and Microwire impose fixedword lengths and strict timing constraints on the slave device. The SIFas described herein allows variable word lengths, and each of master andslave can take as long as it needs to respond to the interface. Thisavoids the need to interrupt the flow of control in the ASIC at shortnotice, in response to unpredictable external stimuli, therebysimplifying the design and verification of new ASIC designs. Inparticular, the programmer of the present ASIC can keep track of realtime supply by calculation from the clock speed and known instructionexecution times. In a conventional processor design, where interruptsmay occur in response to external stimuli, the current location in theprogram is no guide to elapsed real time, and other timer mechanisms,typically implemented by further interrupts, are required to implementreal-time dependent operations. Also, since there is no speedconstraint, the SIF can be used to read values from any of thememory-mapped I/O devices, which might require a lengthy wait forresponse from some off-chip peripheral (keyboard or sensor).

FIG. 12 shows how it is possible to enable the SIF architecture to allowa microcomputer to address plural slave ASICs, for example by providinga separate “chip select” line CS to each slave. The data and clock linesthen become effectively a serial data bus. However, unlike conventionalserial buses, each slave is free to define its own word length,according to need. Therefore, a slave need not provide a 36-bit shiftregister when only eight bits are ever needed by that slave, and themaster need not waste time sending more bits than necessary to anyslave. In FIG. 12, an external microprocessor 1200 is shown as master,while three ASIC devices 1202, 1204 and 1206 are shown as slaves. Theserial databus comprises a common clock line SIF_CLK. The serial dataline which inputs to the slaves is shown as SIF_MOSI (“master out, slavein”) while the common data line on which data is output by the slaves isshown as SIF_MISO (“master in, slave out”). The bi-directional controlline SIF_LOADB is shown connected in common to all devices, andcorresponds to the control line SER_LOADB of the earlier describedembodiments. The master 1200 also has individual chip select outputsSIF_CSO, SIF_CS1 and SIF_CS2, connected to chip select inputs of theslaves 1202, 1204 and 1206 respectively.

In operation, when SIF_CS is low for a given slave ASIC, the SIF_MISOconnection of that slave is put into its high impedance state using atri-state buffer, and also the SIF_LOADB output of that slave is put inthe high impedance state, using the tri-state buffer already provided(see FIG. 7). In the present embodiment, the chip select signal does notaffect SIF_CLK or SIF_MOSI in any way. Therefore, it is still possibleto clock new data into the shift register of the interface, even whenthe chip select is low. However, the data output of the slave SIF_MISOis disconnected from the register, so that no data will be clocked out.

When the chip select is low, the slave will also ignore SIF_LOADB whenit is pulled low by the master, and in particular will not latchSIF_LOADB low and will not queue up a SIF access operation for the nextSIF instruction cycle.

When SIP_CS is high for a given slave, however, than the SIF_MISO outputis enabled, and the SIF_LOADB output line input/output is enabled inexactly the manner described above with respect to FIG. 7. Extension ofthe interface handshake mechanism to provide for plural masters isequally feasible, but will not be described herein.

Another possible modification of the SIF described above concerns theSIF read operation. Since no address value needs to be present in theshift register bits SIF_ADDR[15:0] when the value is readout of theASIC, it would be possible for example for every SIF read operation toprovide access not only to the particular memory location requested, butalso to supply a fixed set of status values such as the flags registeror program counter PC. These values could be available at little extracost, being loaded into the address field of the interface register atthe same time as the data field is loaded, and need not be clocked outby the external device if they are not of interest.

Another feature of the SIF type of interface is that the separate datainput and output wires SER_IN and SER_OUT can be used simultaneously toread data from the ASIC and to load another word into the SIF shiftregister on the ASIC, to set up the next read or write operation. Thispotential for parallel operation of SER_IN and SER_OUT at each cycle ofthe clock SER_CLK is illustrated by the double broken lines in betweenthe flowchart step 630 and the step 602 or 622 in FIG. 8, and can beapplied in other interfaces having a separate data wire for eachdirection.

An interface of the SIF type can also be employed in an apparatus havingmemory and CPU (central processing unit) on different chips. The SIFshift register and the multiplexer which governs access to the addressbus could then be provided by components internal or external to eitherchip.

Concerning the SIF instruction in the processor instruction setdescribed above, it will be apparent that the concept of releasing thememory address bus or register addressing circuitry for external accessduring a particular instruction (effectively an NOP instruction) isbroadly applicable to a wide range of processor architectures. Such aconcept can provide both testability and off-chip communications fornormal operation in a single functional unit. Similarly, the scheme ofhand-shaking provided by the SER_LOADB line in the SIF interfacecircuitry can be applied in a wide range of interface types, and isparticularly advantageous where the speed and timing capabilities of twodevices wishing to communicate are either unknown or anyhow widelydisparate.

Also, rather than depending on the presence of stop state or a specificSIF instruction code within the stored program the processor sequencingcircuitry can simply provide a regular time window, for example onecycle every 128 cycles, in which memory is addressed by the SIF and notby the processor itself.

Alternatively, or in addition, the SIF instruction code may be replacedby a family of instruction codes, allowing different types of serialinterface access. For example, a programmer may be happy to allow SIFread operations more often during the running of the program than SIFwrite operations. In such a case, separate instructions SIF_READ_ONLY(allowing the interface only to read from the memory space) and SIF_ALL(allowing both read and write operations) might be defined, for example.Then, if a SIF_WRITE instruction is received via the interface, thiswill only be processed during the next SIF_ALL cycle, irrespective ofhow many SIF_READ_ONLY cycles may have been executed in the meantime.Similarly, different instructions might be provided to allow access todifferent parts of the process of address space at different times. Ofcourse, for each instruction of this type, it is still the case that theprogrammer determines only the timing of the memory access, while thespecific memory access operation desired is defined by the externalapparatus.

The de-bug control circuitry, featuring the control lines STOPB andRUN_STEP is similarly applicable in a wide range of processorarchitectures and applications. While this control mechanism isparticularly useful in combination with the serial interface functionsdescribed, these features are also of use independently. The provisionof a breakpoint instruction (BRK above) which is conditional on theexistence of the de-bug mode is also advantageous in itself,particularly where the microprocessor control program is stored in ROMmemory on chip. As described above, the breakpoint instruction BRK canbe present in all prototype and final versions of the stored program,but will be effectively ignored by the processor during normaloperation.

The SLEEP operation, being defined in the instruction set, also givesthe programmer control of the point in program execution at which theSLEEP rate will be entered. Even when the SLEEP state is to be commandedfrom outside the processor, using the WAKE_UP signal line, it willalways be delayed until the processor is at a certain point in theprogram execution, which allows a more verifiable design. Similarly, theprocessor will always wake up at a known point in the program, which inthis embodiment is the instruction after SLEEP.

The above and other generalisations will be obvious to the skilledreader and are within the scope of the present invention. Althoughvarious specific aspects of the invention are defined above and in theattached claims, the applicant reserves the right to claim any novelfeature or novel combination of features disclosed explicitly orimplicitly herein.

What is claimed is:
 1. A data processing apparatus including; aprocessor constructed to operate under control of a stored programcomprising a sequence of program instructions selected from apredetermined instruction set; an interface circuit which is operable toprovide an interface for an external apparatus to signal a request foraccess to one of a plurality of storage locations within the processor,said one of a plurality of storage locations being specifiedindependently of the stored program in a communication request suppliedby the external apparatus to the interface circuit; and control meansoperable to cause the processor to provide access between the specifiedstorage location and the interface circuit in response to such acommunication request only at predetermined points in the execution ofthe stored program, said control means being operable to fix periods oftime for providing such access relative to the sequence of programinstructions such that execution timing of the stored program isindependent of whether or not such a communication request is suppliedto said interface circuit by said external apparatus.
 2. An apparatusaccording to claim 1, wherein said control means comprises a genericcommunication instruction (SIF) of the instruction set, the genericcommunication instruction having a fixed execution time by saidprocessor.
 3. An apparatus according to claim 1, wherein the processorand stored program are provided on a single integrated circuit.
 4. Anapparatus according to claim 1, wherein said interface circuit isoperable to provide a serial interface to the external apparatus.
 5. Anapparatus according to claim 4, wherein said serial interface includesseparate data lines for input and output of data to the interfacecircuit, which allows data to be shifted serially into the interfacecircuit at the same time as data is being shifted serially out of theinterface circuit.
 6. An apparatus according to claim 5, wherein saidserial interface further includes a clock line for synchronizing datainput and output asynchronously with respect to operation of saidprocessor.
 7. An apparatus according to claim 1, wherein thecommunication request includes a specific communication instructionloaded by the external apparatus into an interface register of theinterface circuit, and wherein the control means is operable to provideaccess between the specified storage location and the interface registerunder control of the specific communication instruction.
 8. An apparatusaccording to claim 7, wherein the interface circuit includes at leastone control line which has an active state which is setable by either ofthe interface circuit and the external apparatus, wherein said externalapparatus is operable to signal said request for access by setting theactive state of said at least one control line, wherein said interfacecircuit is operable to detect the setting of said active state of saidat least one control line by said external apparatus and to maintain theactive state on said at least one control line until access has beenprovided between the specified storage location and the interfacecircuit during one of said periods of time and wherein said interfacecircuit is operable to release said control line when it is ready toreceive a subsequent communication request.
 9. An apparatus as claimedin claim 8 wherein said idle state of said at least one control line issetable by said external apparatus and wherein a specific communicationinstruction of said subsequent communication request will not be readinternally from the interface register and processed until the controlline has been released by both the interface circuit and the externalapparatus for a predetermined time and then again set to the activestate by the external apparatus.
 10. An apparatus according to claim 8,wherein, at least for a specific communication instruction which isrecognized by the processor as requiring information in response, saidprocessor is operable to load said information into the interfaceregister of the interface circuit during said time period, before theinterface circuit releases the control line, which allows saidinformation to be read from the interface register by the externalapparatus.
 11. An apparatus according to claim 7, comprising circuitrywhich is operable to interpret the specific communication instruction inaccordance with pre-determined rules.
 12. An apparatus according toclaim 7, wherein the processor has a basic data width of sixteen bits,and the interface register contains more than thirty bits.
 13. Anapparatus according to claim 7, wherein said specific communicationinstruction includes an address field which is interpreted by theprocessor as specifying said storage location within a storage space ofthe processor, and wherein the control means is responsive to thespecific communication instruction during said time period to allowreading, writing or a selection of reading and writing to be performedbetween the interface circuit and the specified storage location.
 14. Anapparatus according to claim 13 wherein said storage space comprisesdata space, register space, and/or program memory of the processor. 15.An apparatus according to claim 14, wherein said processor comprises amemory access interface which is operable to distinguish between memoryaccesses performed under normal program control and memory accessesperformed in response to the communication request.
 16. An apparatusaccording to claim 13, wherein said specific communication instructionis a read instruction and wherein said external apparatus is operable toload an address into an address field portion of the interface registerto initiate one read operation without loading the entire interfaceregister.
 17. An apparatus according to claim 13, wherein said specificcommunication instruction is a read instruction, wherein said interfaceregister includes a data portion for data read from said storagelocation, separate from an address field portion and wherein saidexternal apparatus is operable to read data from the data portion of theinterface register to complete one read operation without reading outthe entire interface register.
 18. An apparatus according to claim 13,wherein said specific communication instruction is a read instruction,wherein said interface register includes a data portion for data readfrom said storage location, separate from an address field portion andsaid interference register is arranged to allow the external apparatusto read data from the data portion of the interface register to completeone read operation while an address is being loaded into the addressfield portion of the interface register to initiate another readoperation.
 19. An apparatus according to claim 13, wherein said externalapparatus is operable to select a reading or writing operation withinthe processor under control of a selection field in the specificcommunication instruction loaded into the interface register.
 20. Anapparatus according to claim 13, wherein said interface register is ashift register which is loadable bit-serially under control of saidexternal apparatus.
 21. An apparatus according to claim 20, wherein saidinterface register includes a separate data portion and address portion,and wherein said address portion is nearer than the data portion to aninput end of the shift register.
 22. An apparatus according to claim 21,wherein said shift register further includes a read/write selectionportion nearer than the data portion to the input end of the shiftregister.
 23. An apparatus according to claim 1, wherein the processoris responsive to a predetermined external signal so as to enter a statein which program controlled operation of the processor is suspended, andwherein said control means is responsive to received communicationrequests to provide said access at any time during said suspended state.24. An apparatus according to claim 1, wherein said processor has nointerrupt handling facility.
 25. An apparatus according to claim 1,further comprising mixed analogue and digital signal processingcircuitry integrated with said processor in a single integrated circuit.26. An apparatus according to claim 1, wherein address lines andread/write select line within the processor are operable to carryaddress and selection signals for accessing said storage locations, andwherein said control means is operable to select, for application tosaid aess and said read/write select line, either address and selectionsignals generated by addressing means of the processor, or, during saidperiods of time, address and selection signals supplied by the interfacecircuit.
 27. An apparatus according to claim 1, wherein said processorcomprises: an instruction decoding circuit for implementing controlwithin the processor in accordance with the stored program; anarithmetic unit having input and output data paths of width n bits; aregister of width greater than n bits connectable under control of theinstruction decoding circuit to at least one of the input and outputpaths of the arithmetic unit; and a shifting circuit separate from thearithmetic unit for performing shift operations of said greater widthusing said register.
 28. An apparatus according to claim 27 wherein saidinstruction decoding circuit is responsive to a predetermined divisioninstruction to control the arithmetic unit, the shifting circuit and theregister so as to divide two values of width n bits, the quotient andremainder being obtained in the register.
 29. An apparatus according toclaim 28, wherein the division operation is performed over pluraloperating cycles of the arithmetic unit and the shifting circuit.
 30. Anapparatus according to claim 27, wherein the shifting circuit isoperable to perform only single bit position shifting per operatingcycle.
 31. An apparatus according to claim 27, wherein dedicatedunidirectional data paths are provided (i) between the register outputand the input path of the arithmetic unit and (ii) between thearithmetic unit output and the register input.
 32. An apparatusaccording to claim 27, comprising at least one external control line,and wherein at least one instruction of the instruction set is decodeddifferently by said decoding circuit depending on a signal present onthe external control line.
 33. An apparatus according to claim 27,wherein the processor has a basic instruction cycle sub-divide intoplural internal clock states, and wherein, for at least onecombinational logic circuit having plural input lines and functioningunder control of each stored program, means are provided to sample andlatch input values for the combinational logic circuit only at a definedstate or states within the operational cycle.
 34. An apparatus accordingto claim 33, wherein said combinational logic circuit comprises thearithmetic unit of the processor.
 35. An apparatus according to claim34, wherein said combinational logic circuit further comprises theinstruction decoding circuit of the processor.
 36. An apparatusaccording to claim 27, wherein said shifting circuit is physicallyconnected between the output path of the arithmetic unit and data inputsof said register, with feedback also from data outputs of said register.37. An apparatus according to claim 27, wherein said register of greaterwidth comprises a pair of registers each of width n, independentlyconnectable to an input or output path of said arithmetic unit.
 38. Anapparatus according to claim 27, wherein said instruction decodingcircuit is responsive to a predetermined multiplication instruction tocontrol the arithmetic unit, the shifting circuit and the register so asto multiply two values of width n bits, the result being obtained in theregister.
 39. An apparatus according to claim 38, wherein themultiplication operation is performed over plural operating cycles ofthe arithmetic unit and the shifting circuit.
 40. An apparatus accordingto claim 1, wherein the execution of each program instruction takes anumber of processor execution cycles, dependent on the type ofinstruction, and wherein a further instruction is not executable until acurrent instruction has been completely executed.
 41. An apparatusaccording to claim 1, wherein said processor comprises: an instructiondecoding circuit for implementing control within the processor inaccordance with the stored program; an arithmetic unit having input andoutput data paths of width n bits; a register space of width n bitsconnectable under control of the instruction decoding circuit to eitherthe input or output paths of the arithmetic unit, wherein dedicatedunidirectional data paths are provided (i) between the register spaceoutput(s) and the input path of the arithmetic unit and (ii) between thearithmetic unit output path and the input(s) of the register space. 42.An apparatus according to claim 41, wherein a further dedicated path isprovided from a program counter register to one data register of theprocessor for storing a subroutine return address said data pathincluding an incrementer.
 43. An apparatus according to claim 1, furthercomprising means for implementing a state of low power consumption inwhich execution of said program by said processor is suspended, andmeans for ending said suspended state so that execution of said programcontinues from the next instruction in the stored sequence withoutexecution of instructions stored outside said sequence.
 44. An apparatusaccording to claim 43, wherein said suspended state is entered inresponse to a signal applied to the processor, but only at a time in theinstruction sequence defined by inclusion of a specific instruction. 45.An apparatus according to claim 43, wherein said suspended state isentered in response to a sleep instruction which forms part of saidstored program.
 46. An apparatus according to claim 1, furthercomprising: means for imposing a state of low power consumption in whichexecution of said stored program by said processor is suspended and inwhich a clock signal for said processor is isolated from the processor,while the clock signal continues running; and monitoring meansresponsive to any of a plurality of external signals for ending saidsuspended state by reapplying said clock signal to the processor,wherein said monitoring means comprises: plural individual monitoringcircuits for detecting predetermined changes in respective ones of theexternal signals; and a common trigger circuit responsive to outputs ofthe individual monitoring circuits for re-applying said clock signal tothe processor, and wherein the individual monitoring circuits, but notthe common trigger circuit, are isolated from the running clock signalduring said suspended state.
 47. An apparatus according to claim 46,wherein at least one of said individual monitoring circuits comprises:means responsive to said clock signal prior to the suspended state forstoring a value of the corresponding external signal; and asynchronouscircuit means for, during the suspended state, comparing the externalsignal with the stored value.
 48. An apparatus according to claim 46,wherein said read or write operation is performed without interpretationby said processor.
 49. An apparatus according to claim 1, wherein theapparatus has at least one external control line, and wherein at leastone instruction of the instruction set is an instruction which willeither halt the execution of the stored program or will effect nomeaningful operation, depending on a signal applied to said externalcontrol line.
 50. An apparatus according to claim 49, comprising meansfor signaling externally that execution has been halted.
 51. Anapparatus according to claim 49 comprising at least one further externalcontrol line and wherein said apparatus is responsive to signals appliedto said control lines so as to: (I) start and stop the execution of thestored program; (ii) to halt the execution of the stored program at apredetermined instruction; or (iii) to single step through the sequenceof program instructions of said stored program.
 52. In combination, adata processing apparatus and an external apparatus, the data processingapparatus including: a processor constructed to operate under control ofa stored program comprising a sequence of program instructions selectedfrom a predetermined instruction set; an interface circuit which isoperable to provide an interface for the external apparatus to signal arequest for access to one of a plurality of storage locations within theprocessor, said one of a plurality of storage locations being specifiedindependently of the stored program in a communication request suppliedby the external apparatus to the interface circuit; and control meanscomprising (i) means operable to provide fixed periods of time during anexecution timing of the stored program by the processor during whichsaid processor does not access said storage locations under control ofsaid stored program, said fixed periods being provided at predeterminedpoints in the execution of the stored program; and (ii) means operableto cause the processor to provide access between the specified storagelocation and the interface circuit in response to such a communicationrequest only during said fixed periods of time whereby the executiontiming of the stored program by the processor is independent of whetheror not such a communication request is supplied to said interfacecircuit by said external apparatus.
 53. The combination of claim 52,wherein said communication request includes a read instruction loaded bythe external apparatus into an interface register of the interfacecircuit, which interface register includes a data portion for data readfrom said storage location, separate from an address field portion andwherein said external apparatus is operable to read data from the dataportion of the interface register to complete one read operation withouthaving to read out data from the entire data portion of the interfaceregister.
 54. A data processing apparatus including: input means forreceiving input data; a processor operable to process the input data inaccordance with a stored program comprising a sequence of programinstructions selected from a predetermined instruction set and using aplurality of storage locations within the processor; an interfacecircuit for providing an interface for an external apparatus to signal arequest for access to one of said plurality of storage locations withinthe processor, said one of a plurality of storage locations beingspecified independently of the stored program in a communication requestsupplied by the external apparatus to the interface circuit; and controlmeans comprising (i) means operable to provide fixed periods of timeduring an execution timing of the stored program by the processor duringwhich said processor does not access said storage locations undercontrol of said stored program, said fixed periods being provided atpredetermined points in the execution of the stored program; and (ii)means operable to cause the processor to provide access between thespecified storage location and the interface circuit in response to sucha communication request only during said fixed periods of time wherebythe execution timing of the stored program by the processor isindependent of whether or not such a communication request is suppliedto said interface circuit by said external apparatus.
 55. A dataprocessing apparatus including: a processor constructed to operate undercontrol of a stored program comprising a sequence of programinstructions selected from a predetermined instruction set; an interfacecircuit which is operable to provide an interface for an externalapparatus to signal a request for access to one of a plurality ofstorage locations within the processor, said one of a plurality ofstorage locations being specified independently of the stored program ina communication request supplied by the external apparatus to theinterface circuit; and a controller operable to provide fixed periods oftime during an execution timing of the stored program by the processorduring which said processor does not access said storage locations undercontrol of said stored program, said fixed periods being provided atpredetermined points in tie execution of the stored program, thecontroller also being operable to cause the processor to provide accessbetween the specified storage location and the interface circuit inresponse to such a communication request only during said fixed periodsof time whereby the execution timing of the stored program by theprocessor is independent of whether or not such a communication requestis supplied to said interface circuit by said external apparatus.